
STT-RAM has recently been demonstrated to be a promising candidate
to replace SRAM in on-chip caches. STT-RAM posses many attractive characteristics such as fast access
time and low standby power. Along side, the emerging 3D integration technology provides a
cost-efficient way to integrate STT-RAM with multicore architectures/CMPs. However, one of the
systemic disadvantage of STT-RAM technology is the latency associated with the write
operations. In this work, we have developed architectural solutions to alleviate this problem. Our proposed network level solution centers around
prioritizing packets to idle banks and delaying accesses to a STT-RAM bank currently servicing a write request.

Experimental results using a 128-node (64 core and 64 cache banks) CMP system show that the
proposed on-chip network solution can lead to an average 14\% improvement in IPC and 54\% reduction
in energy compared to an equivalent area SRAM implementation across a diverse set of 42 applications
including both multi-threaded and multi-programmed workloads. Additionally, our proposal is efficient
compared to a recently proposed write-buffer mechanism for hiding write latency. Overall, we believe
that our proposal is promising to improve the power and performance envelope of STT-RAM based stacked CMP architectures.
